Efficiently handling constraints in mixed-integer nonlinear programming problems using gradient-based repair differential evolution

Mixed integer nonlinear programming (MINLP) addresses optimization problems that involve continuous and discrete/integer decision variables, as well as nonlinear functions. These problems often exhibit multiple discontinuous feasible parts due to the presence of integer variables. Discontinuous feasible parts can be analyzed as subproblems, some of which may be highly constrained. This significantly impacts the performance of evolutionary algorithms (EAs), whose operators are generally insensitive to constraints, leading to the generation of numerous infeasible solutions. In this article, a variant of the differential evolution algorithm (DE) with a gradient-based repair method for MINLP problems (G-DEmi) is proposed. The aim of the repair method is to fix promising infeasible solutions in different subproblems using the gradient information of the constraint set. Extensive experiments were conducted to evaluate the performance of G-DEmi on a set of MINLP benchmark problems and a real-world case. The results demonstrated that G-DEmi outperformed several state-of-the-art algorithms. Notably, G-DEmi did not require novel improvement strategies in the variation operators to promote diversity; instead, an effective exploration within each subproblem is under consideration. Furthermore, the gradient-based repair method was successfully extended to other DE variants, emphasizing its capacity in a more general context.


INTRODUCTION
Mixed integer nonlinear programming (MINLP) problems represent the subset of optimization problems that involve continuous and discrete/integer decision variables, as well as nonlinear functions.In engineering and related fields, discrete and integer variables are commonly used to represent values restricted to specific sets.Discrete variables are often used when values are limited to standard elements, such as rod diameters, structural beam dimensions, and commercial resistance values.Integer variables are frequently employed to represent groups of identical elements, such as the number of products manufactured, pins in an electrical connector, or gear teeth in a gear train.
A MINLP problem can be expressed by Eq. ( 1): min f ðx; yÞ subject to : g i ðx; yÞ 0; i ¼ 1; …; n h j ðx; yÞ ¼ 0; j ¼ 1; …; m x l k x k x u k ; k ¼ 1; …; n 1 y l q y q y u q : integer; q ¼ 1; …; m 1 ½x; y 2 (1) where f ðx; yÞ is the objective function (OF) of the problem, x is a vector of continuous variables, y is a vector of integer variables.x l k and x u k are the lower and upper bounds of x k , respectively, whereas y l q and y u q are the lower and upper bounds of y q , respectively.The decision variable space is defined by , and g i ðx; yÞ and h j ðx; yÞ denote the ith inequality constraint and the jth equality constraint, respectively.
One key challenge in MINLP problems is that the presence of integer variables generates a nonconvex search space, which can lead to a combinatorial explosion in the number of possible solutions.This is because the discrete variables create a set of discontinuous feasible parts, and the optimization algorithm must explore each part separately.Figure 1 illustrates a generic MINLP problem with a search space defined by two variables, x and y.The variable x can take all real values, and y is limited to integer values 0; 1; …; 4 f g .The gray area is the feasible region defined by the constraints.Contour lines connect points from discontinuous parts with equal fitness values through steps.However, these steps solely provide a visual indication of the function's behavior in the search space, which cannot be evaluated for non-integer values of y.The figure reveals another important aspect of MINLP problems: the high variability in the sizes of the discontinuous feasible parts (red lines).This phenomenon is attributed to the coexistence of constraints and integer variables.
Stochastic optimization methods such as EAs have been updated to address the challenges posed by MINLP problems, e.g., genetic algorithms (Deep et al., 2009), evolution strategies (Costa & Oliveira, 2001), differential evolution (Lampinen & Zelinka, 1999), particle swarm optimization (PSO) (Wang, Zhang & Zhou, 2021), ant colony optimization (Liao et al., 2013), among others (see Boukouvala, Misener &Floudas, 2016 andPloskas &Sahinidis, 2022 for a comprehensive survey of algorithms and software).EAs have successfully been applied to solve remarkable MINLP problems, such as the design of an interplanetary space mission trajectory (based on the Galileo mission from NASA in 1989) (Schlüter et al., 2013), optimal control problems for an F8 aircraft (Sager, 2011), as well as optimal batch plant design problems (Ponsich & Coello, 2011;Ji & Gu, 2024).Recently, EAs have been used to optimize hyperparameters for convolutional neural networks (Liu & Wang, 2023), reduce neural network structure complexity (Sildir, Sarrafi & Aydin, 2022), and design a multimodal hub-and-spoke transportation network for emergency relief during the COVID-19 pandemic (Li et al., 2023).However, premature convergence remains a major concern for EAs when addressing MINLP problems with variable-size discontinuous feasible parts.In such cases, the population tends to converge on the biggest feasible parts because the EAs tend to detect feasible solutions more easily within these regions.Consequently, this approach may limit the exploration of smaller feasible parts that could contain better solutions.
As can be seen in Fig. 1, a MINLP problem can be expressed as a set of k subproblems, where k corresponds to all the admissible values of y, in this case, 0; 1; ::; 4 f g .Note that the global minimum is the optimal solution for all subproblems.However, due to the dimension of k in several problems, its overall combinatorial enumeration could be computationally expensive or even prohibitive.Two important issues are derived from this approach: (i) an efficient combinatorial exploration in the space of discrete variables, and (ii) an efficient exploration and exploitation within each generated subproblem.
Commonly used operators in EAs are often blind to constraints, making it difficult to solve highly constrained problems (Craenen, Eiben & Marchiori, 2001;Hamza, Essam & Sarker, 2015).This has a high cost for algorithm performance in solving highly constrained MINLP problems due to the challenge of generating feasible solutions.To address this point, the operators in diverse proposals have been modified to achieve higher diversity and more opportunities to explore each subproblem.As optimization problems become more complex, particularly within the realm of MINLP problems, the challenge of effectively exploring highly constrained spaces becomes a substantial obstacle.This work proposes a DE variant with a gradient-based repair method for MINLP problems (G-DEmi).Unlike previous methods focused on diversification to enhance performance, G-DEmi takes a distinctive approach.It attempts to improve promising but infeasible solutions in different subproblems by using gradient information extracted from the constraint set.The proposal was evaluated using benchmark problems selected from   Datta & Figueira (2013), Liu et al. (2023), andMINLPLib (2023), which encompassed several challenges posed by mixed variables.Additionally, the subway optimization problem was also addressed as a real-world problem.The proposal was rigorously compared with other state-of-the-art EAs using descriptive and inferential statistics.The results show that the proposal consistently outperforms competitors in terms of solution quality and algorithm robustness.Notably, G-DEmi achieved high success rates even in highly constrained problems where other algorithms failed to find feasible solutions.These findings indicate that the gradient-based repair method significantly contributes to the performance of G-DEmi across a wide range of MINLP problems.This algorithm proved its efficiency in exploring each generated subproblem since it did not require new operators to promote a higher diversity but the conventional DE operators.Furthermore, the repair method was successfully extended to other DE variants, highlighting its robustness and effectiveness in a more general context.This article is organized as follows: "Related Work on Evolutionary Algorithms for MINLP Problems" presents an overview of the state-of-the-art works referred to the development of EAs for MINLP problems."Base Methods" gives a brief description of DE and the gradient-based repair method.The proposed algorithm, G-DEmi, is introduced in "Proposed Algorithm"."Experimental Study" describes the experimental study conducted on G-DEmi, and "Real-World Case Study" addresses the real-world case study."Conclusions" presents the final considerations.

RELATED WORK ON EVOLUTIONARY ALGORITHMS FOR MINLP PROBLEMS
In the field of MINLP, several techniques have been employed to address its complexities.The initial approach involved genetic algorithms with binary coding to represent integer and real variables (Li & Gen, 1996).This approach has been extended to other algorithms such as particle swarm optimization (Datta & Figueira, 2011), differential evolution (Datta & Figueira, 2013), and artificial bee colony (Wang et al., 2017).However, the effectiveness of these techniques is significantly affected by problem dimensionality and the ranges of the integer and real variables.
To address the issues of integer variables, some studies proposed truncation or rounding methods during the optimization process (Deep et al., 2009;MathWorks, 2023).However, these operations may introduce search bias or drastic reductions in diversity, potentially impacting the performance of the algorithm.More refined strategies have been proposed instead of simple rounding or truncation.A triangular mutation rule was introduced by Mohamed (2017), which is based on three randomly selected vectors and the difference vectors between the best, medium, and worst individuals within that triplet.This operator was designed to improve the global and local search capabilities and to increase the convergence speed of DE.Jalota & Thakur (2018) proposed a variant of GA with BEX crossover and power mutation.Additionally, a truncation technique based on the floor and ceiling function was applied to maintain the randomness in the evolutionary process.
Another variation of DE with cluster strategy into the mutation operator was presented by Mohamed et al. (2019).This approach selects three random elements from different clusters to perform the mutation.Song et al. (2023) proposed a hybrid sine and cosine algorithm that uses a position update formula to enhance its search capability.In addition, they proposed three mutation strategies with different balances between exploration and exploitation to prevent local convergence.
Another approach involves estimation-of-distribution algorithms (EDAs) combined with different strategies for solving MINLP problems.Wang et al. (2019)  Other studies have presented specific strategies for directly addressing discrete variables.Schlüter (2012) proposed a discretized version of the multi-kernel function for the mixedinteger distributed ant colony optimization (MIDACO) algorithm.MIDACO employs an oracle penalty function proposed by Schlüter & Gerdts (2010), which significantly biases the search towards promising regions of the solution space.Lin et al. (2018) proposed a setbased DE approach that preserves the original search mechanism, avoiding space transformation.Liu, Li & Ge (2022) proposed a hybrid quantum annealing-double-elite spiral search algorithm to address MILNP problems.The algorithm was applied to solve the integer subproblems using a quantum-tunneling-based annealing mechanism.
Recent works have focused on overcoming the limitations due to search spaces in MINLP problems.Liu et al. (2021b) proposed a multi-objective DE aimed at identifying solutions that equally satisfy integer and fitness conditions.In Liu et al. (2021a), a cutting and repulsion strategy was proposed to suppress unpromising discontinuous feasible parts during exploration and to escape from local optima.In Liu et al. (2023), a surrogateassisted DE was proposed, featuring an adaptive pre-screening operation to prevent excessive exploitation in local regions, which restricts the number of individuals sharing the same integer values.Hamano et al. (2022) suggested a strategy to generate integer variables with a lower-bounding marginal probability model to avoid stagnation caused by the discretization granularity in the search space.In Molina-Pérez et al. (2022, 2023), an EDA was presented, where a link strategy between the histogram model and the e-constrained method reduces the influence of the larger discontinuous feasible parts in the exploration.A kriging-assisted DE was proposed to deal with mixed-integer variables by Liu et al. (2024).Promising regions near the feasible region are created to promote exploration from infeasible regions.Solutions are then guided to promising feasible regions through local search.Molina-Pérez et al. (2024) proposed a DE variant that explores promising regions from infeasible contours by considering a group of "good fitnessinfeasible solutions".This approach aims to reduce the vulnerability of solutions to being attracted by non-promising large discontinuous feasible parts.
While numerous proposals exist to address MINLP problems, certain studies have focused on particular problems or relatively simple problem sets.In several cases, the implementation of additional operators has been required to prevent premature convergence in the search space, although the underlying reasons for this have not been explicitly stated.Only in recent works the distinctive landscape that mixed variables create within the search space has been clearly addressed.This perspective, involving disconnected feasible regions of varying sizes, has enabled more efficient strategies for solving increasingly complex problems.
In this context, gradient-based repair methods have shown high effectiveness in overcoming the limitations of EAs in handling constraints.The first and most commonly used technique was proposed by Chootinan & Chen (2006).It uses Newton's root-finding method to repair infeasible solutions.More recently, several gradient-based methods have been introduced, including repair method on surrogates of the constraint functions (Koch et al., 2015), random direction repair method (Xu et al., 2021), heuristic repair with historical information (Du et al., 2022), and adaptive repair method (Yang et al., 2023).However, these tools haven't received proper attention in MINLP problems, despite their potential utility, especially considering the search space comprising multiple constrained subproblems.We propose G-DEmi, with a gradient-based repair method to rectify promising infeasible solutions in different subproblems using gradient information derived from the constraint set.Unlike prior methods, this novel approach aims to directly address the constraint challenges by reducing the insensitivity of DE to constraints in MINLP problems.Notably, G-DEmi improves the performance without needing new operators for higher diversity than those employed in the conventional DE.

Differential evolution
Storn & Price (1997) proposed a population-based optimization algorithm, DE, that has gained popularity for its high success rate and straightforward implementation.DE consists of four fundamental processes: initialization, mutation, crossover, and selection, which are repeated in each cycle.
The initial population P is created through random sampling from the search space.Each individual in the population, denoted as x i ¼ ðx i;1 ; …; x i;j Þ, represents a solution to the problem with x i being the ith individual, i ¼ 1; …; NP. j is the number of decision variables and NP is the population size.A mutant vector v i ¼ ðv i;1 ; …; v i;j Þ is generated for every individual x i as the base of the evolution.Several mutation operators have been proposed (Abbasa, Ahmadb & Jabeenc, 2017).In this study, the following versions were employed: . DE/rand/1: (2) .DE/current-to-rand/1: . DE/best/1: . DE/rand-to-best/1: where x best is the best individual in the population; vectors x r are different between each other and from x i , and randomly chosen from the population; and F is the scaling factor.
DE executes a crossover operation between the target vector x i and its mutant vector v i to generate the trial vector u i ¼ ðu i;1 ; …; u i;j Þ.The binomial crossover is defined in Eq. ( 6), where the control crossover parameter CR is a value in ½0; 1.A uniformly distributed random number rand j is generated for each decision variable j in the interval ½0; 1, and j rand is a random integer in the range ½1; NP.
During the selection step, the best individual is chosen between x i and u i to be carried forward to the next generation.A set of feasibility rules (Deb, 2000) is employed to compare them as follows: 1. Between two infeasible solutions, the one with lower constraint violation is preferred.
2. If one solution is infeasible and the other one is feasible, the feasible solution is preferred.
3. Between two feasible solutions, the one with better objective function value is preferred.

Gradient-based repair method
The technique of gradient-based repair was first introduced for genetic algorithms (Chootinan & Chen, 2006).Takahama & Sakai (2009) incorporated this approach in a modified proposal version of the constraint handler e-constrained.This method uses the gradient information of the constraint set to guide the infeasible solutions toward feasible regions.The gradient can be obtained directly from the constraints or, if the analytical derivation is challenging, it can be estimated by numerical techniques such as the finite differences (Chapra & Canale, 2010).
VðxÞ is defined by Eq. ( 7) as a vector that contains the violation degree for each inequality (g) and equality (h) constraint in a given problem, for a particular solution vector x.Parameters n and m are the number of inequality and equality constraints, respectively, and e specifies the tolerance for equality constraints.Preserving the sign of the equality violation is a crucial aspect of the method and is represented by the function sgnðhÞ.

VðxÞ
The gradient matrix of these constraints with respect to the N components of x, denoted as rVðxÞ, is defined by Eq. ( 8): The forward approximation of finite differences provides an estimate of these derivatives, as defined in Eq. ( 9), where Dx represents the step size and e j is a unitary vector of the same dimension as x, with a value of 1 for the jth component and 0 for the rest.
This repair method aims to transform x into a feasible solution, which involves setting the elements of the vector VðxÞ to zero.By employing Newton-Raphson's method (Burden, Faires & Burden, 2015), a repaired vector x kþ1 can be iteratively obtained through Eq. ( 10), which represents a linear approximation of Vðx k Þ in the direction of the origin.However, it is common that the number of variables differs from the number of constraints.In this case, the rVðx k Þ matrix is non-invertible and the Moore-Penrose pseudoinverse (Campbell & Meyer, 2009) must be used, as shown in Eq. ( 11): where x kþ1 and x k represent the updated and current values of the vector x, respectively, rVðx k Þ À1 and rVðx k Þ þ represents the inverse and the Moore-Penrose pseudoinverse matrix of the gradient matrix rVðx k Þ, respectively.A computationally efficient way of finding rVðx k Þ þ is by employing singular value decomposition (Barata & Hussein, 2012).
Before proceeding with the pseudoinverse, removing all zero elements of Vðx k Þ and their corresponding values in rVðx k Þ is important.

Gradient-based repair method for MINLP problems
As mentioned above, MINLP problems can be analyzed as multiple subproblems, some of which may be highly constrained.This impacts the performance of EAs, whose operators are typically insensitive to constraints, leading to the generation of many infeasible solutions.We propose a basic DE implementation with a gradient-based repair method for MINLP problems, G-DEmi.The repair strategy explores the subproblems independently to improve the exploration inside them.Specifically, for repairing a vector with mixed variables [x, y], only the real variables x are modified while the integer variables y remain fixed.
In contrast to the variants previously described (Chootinan & Chen, 2006;Takahama & Sakai, 2009) with a probability-based approach for repairing solutions, our method repairs only trial vectors u that satisfy two conditions: (i) they lost the tournament against their target vectors, but have a better objective function value, and (ii) they belong to a subproblem (y) where no solution has been repaired in the current generation.These conditions aim to promote the repair of trial vectors with a higher potential to improve OF, exploring each subproblem independently without repairing similar solutions multiple times.
The repair method is explained in Algorithm 1.A mixed trial vector u ¼ ½x u k ; y u T is defined, where only the x u k component is updated during the iterative repair process.As a result, the constraint violation degree vector and the gradient matrix can be found by Algorithm 1 Gradient-based repair method.Input: u; k max ; T min ; Output: u; 12) according to Eq. ( 7); 4: calculate rVðx u k ; y u Þ in Eq. ( 13) according to Eq. ( 8); 5: remove all zero elements of Vðx u k ; y u Þ and their corresponding values in rVðx u k ; calculate x u kþ1 according to Eq. ( 11); 8: 11: end while Stopping criteria: . k !k max : maximum number of iterations reached.
. Vðx u k ; y u Þ ¼ 0: all elements of the constraint violation degree vector are zero. .T u T min : maximum absolute difference between x u kþ1 and x u k is equal or lower than T min .
Eqs. ( 12) and ( 13), respectively.Both elements are updated in lines 3 and 4 of Algorithm 1.After removing all zero elements of Vðx u k ; y u Þ and their corresponding values in rVðx u k ; y u Þ, the pseudoinverse rVðx u k ; y u Þ þ is determined in line 6.Finally, the trial vector is updated between lines 7 and 9.The algorithm stops when all elements of vector Vðx u k ; y u Þ become zero, indicating a feasible trial vector.Additionally, two stopping criteria are employed to prevent long repair cycles that could cause arbitrary increases in execution time.The first is the maximum number of iterations, which stops the process after k max iterations.The second is the minimum tolerance, which stops the process if the maximum variation in the trial vector during repair is lower than T min .
The repairing procedure is illustrated with the following example: consider a scenario with an inequality and an equality constraint, given by Eq. ( 14).
As can be seen, only h 1 ðx u 1 ; y u Þ was violated.Therefore, the element of g 1 ðx u 1 ; y u Þ in Vðx u 1 ; y u Þ and their corresponding values in rVðx u 1 ; y u Þ need to be removed.This leads to Vðx u 1 ; y u Þ ¼ À1:4999.Then, rVðx u 1 ; y u Þ and its pseudoinverse rVðx u 1 ; y u Þ þ are computed as shown in Eq. ( 17).
Using Eq. ( 11), the vector x u 2 is generated as follows: The updated vector Vðx u 2 ; y u Þ is obtained by Eq. ( 18).As can be seen, the values of ðx u 2 ; y u Þ satisfy all constraints.Therefore, the trial vector has been successfully repaired, and its new values are u ¼ ½2:75 1:75 1 T .
Figure 2 provides an explanation of the gradient-based repair method within the context of DE, for a hypothetical MINLP problem.The feasible region is represented by a gray area, and includes discontinuous feasible parts generated by discrete variables shown as red lines.The objective space is defined by three contour curves: the worst level, a better level, and the best level.In Fig. 2A, the standard DE variant illustrates two instances of targettrial selection.Notably, the target solutions generated infeasible trials, making the target vectors the survivors in both cases, as indicated by the arrow directions.On the other hand, the gradient-based repair DE variant takes into account that both trial solutions have a better value in the objective space than their respective target vectors.This implies that both trial vectors must be repaired and compete again with their targets.Figure 2B illustrates how the repaired vectors enter the feasible regions and outperform their targets.This analysis highlights the advantages of the gradient-based repair method over the traditional selection method for finding feasible solutions within potentially promising subproblems, thereby promoting exploration within better feasible regions.Given the insensitivity of DE to constraints, repair might be essential to achieve feasibility in highly constrained subproblems.

Overall implementation
Algorithm 2 describes G-DEmi.To generate the initial population P g in step 1, random real and integer values for the solution vector ½x; y are taken.and the constraint violation degree Gðx g Þ are then evaluated.In step 8, a trial vector u g;i is generated for each target vector x g;i , using mutation and binomial crossover (rand/1/bin).The integer variables in u g;i are rounded in step 9 before evaluating the vector in step 10.In the selection operation (steps 11 to 20), the trial vector is compared to its corresponding target vector.In step 11, the best solution is selected according to the feasibility rules described in "Base Methods".If the trial vector fails to improve its target but still has a lower objective function value ðf ðu g;i Þ < f ðx g;i ÞÞ, and no other vector of the same subproblem has been repaired in the current generation ðy u g;i = 2 YÞ, this trial vector is repaired in step 14.In step 16, following the feasibility rules, the better solution between the repaired vector and its target vector passed to the population of the next generation P gþ1 .Through these steps, G-DEmi generates a new population in each generation.numbers of continuous and discrete variables, respectively; IC and EC are the numbers of inequality and equality constraints, respectively; and AC is the number of active constraints for the best-known solution so far.
A performance evaluation was conducted to compare G-DEmi with six state-of-the-art algorithms.Four different versions of DE were considered, including MI-EDDE (Mohamed et al., 2019), BOToP (Liu et al., 2021b), FROFI (Wang et al., 2015), and SADE (Qin & Suganthan, 2005;Huang, Qin & Suganthan, 2006) Due to the stochastic nature of EAs, more than simply assessing the quality of outcomes is required to evaluate their performance.Hence, in this experiment, descriptive and inferential statistics were employed.To achieve this, each problem was solved with a maximum of 200,000 OF evaluations and 30 independent runs.The equality-constraint tolerance was set at 1 Â 10 À04 .A run was considered successful if jf ðx best Þ À f ðx Ã Þj 1 Â 10 À04 , where x Ã denotes the best-known solution and x best represents the best solution generated by the algorithm.All algorithms and their instances were executed using the same computing resources: an Intel Core i7 -4790 CPU @3.6 GHz and 32 GB RAM with Windows 10 Enterprise N and MATLAB 2023a (The MathWorks, Natick, MA, USA).
The iRace parameter tuning tool was used to select parameter sets for all algorithms (López-Ibáñez et al., 2016).The tuning process used a representative set of test problems consisting of F1, F3, F5, F7, F9, F10, F12, F14, F16, F18, F20, F22, F24, F26, and F28.For each algorithm, 2,000 experiments were conducted in iRace.The parameters used in each algorithm are presented in Table 2, where population size is denoted by NP, crossover rate by CR, scaling factor by F, maximum number of repair iterations by k max , minimum repair tolerance by T min , proportion of population partitions by P a , and initial tolerance for equality constraints by e 0 .Parameters t c and c p are the controllers of the e-constrained method.For algorithms with a learning process, L denotes the learning iteration count, a is the learning rate, r represents the chaotic search radius, c is the rate of parameter adaptation, p is the greediness of the mutation strategy, e is the link parameter, r M is the mutation rate, b represents the mutation factor, w denotes the number of histogram bins, e represents the end bins parameter, A c is the acceleration coefficient, and q represents the shrinking parameter.The failure threshold is denoted by T. In replacement strategies, NR indicates the number of surviving replacements, and d is the improvement tolerance.Additionally, l 1 and l 2 represent the count of weighted difference vectors.
As described above, G-DEmi rounds the values of integer variables after the rand/1/bin operation.The value of F suggested by iRace (higher than the 0.5 recommended by Storn & Price, 1997) can be effective in reducing the repeated integer values after rounding and encouraging the exploration of different discontinuous feasible parts.The value of CR indicates that crossover is also significant in this scheme to prevent accelerated convergence.The population of 50 individuals coincides with several successful DE experiments for MINLP problems (Mohamed et al., 2019;Liu et al., 2021a).

Benchmark results and analysis
The experimental results were analyzed using descriptive statistics.The selected indicators are widely recognized in the field for assessing algorithms, providing a comprehensive view of their performance (Liu et al., 2021b(Liu et al., , 2021a)).The feasible rate (FR) represents the percentage of runs in which the algorithm finds at least one feasible solution, whereas the successful rate (SR) represents the percentage of runs in which the algorithm finds the optimal solution within the given tolerance.Ave and Std denote the average and the standard deviation of the best OF values of each problem from its 30 independent runs.If an algorithm cannot achieve 100% FR, then the Ave and Std values are denoted as "NA".Mean FR and mean SR denote the mean feasible rate and the mean successful rate achieved by the algorithm over the twenty-eight test problems.T represents the average execution time for each problem in seconds.
Inferential statistics were employed to evaluate the statistical significance of the differences in results.A comparison between G-DEmi and the other algorithms was carried out using the Wilcoxon Rank-Sum Test (WRST) with a 0.05 significance level    (Derrac et al., 2011).Additionally, we applied the Bonferroni correction to control the family-wise error rate that arises from multiple pairwise comparisons (Dinno, 2015).The symbols indicating the superiority, inferiority, or similarity of our proposal compared to each competitor are denoted by "+," "−," and "%", respectively.Tables 3 and 4 present the statistical values of each problem, in separate rows.For example, in problem F1, all algorithms successfully found feasible solutions across all executions, yielding 100% FR.The success rates indicate that MI-EDDE failed every run (SR = 0), whereas the other competing algorithms consistently achieved the optimal solution (SR = 100).Based on WRST, G-DEmi was comparable to the other competitors,  Mean FR, mean SR, and the final count of WRST are provided in the last section of Table 4.It can be observed that G-DEmi reached feasible solutions in all executions for all problems, 100% of Mean FR, surpassing the 84.52% for MI-EDDE, 82.86% for BOToP, 85.83% for SADE-CaR, and 90.36% for FROFI-CaR.Note that the highly constrained condition of problems F19 and F26-F28 had no influence on the FR of G-DEmi.However, this condition significantly affected the performance of the competing algorithms.In terms of mean SR, G-DEmi obtained the highest value at 98.10%, followed by FROFI-CaR at 73.45%, SADE-CaR at 59.52%, BOToP at 46.79%, and MI-EDDE at 38.10%.The final count of WRST shows that G-DEmi significantly outperformed MI-EDDE in 20 problems, whereas both algorithms had similar results in eight problems, with MI-EDDE never surpassing the performance of G-DEmi.BOToP is significantly outperformed by G-DEmi in 17 problems, equaled in 11 problems, and never achieving better results than G-DEmi.Similarly, FROFI-CaR was surpassed by G-DEmi in 10 problems, whereas its results were similar for 17 problems, and it only significantly exceeded G-DEmi in one problem (F11).In this experiment, there were some outlier values that should be noted.For problems F1-F3, the solutions of MI-EDDE tended to converge towards local optima deliberately located in the most accessible discontinuous feasible parts.In contrast, problems F7 and F13, with equality constraints, affected the performance of other algorithms but not the G-DEmi proposal.In problems F8-F10, certain points were attractive to the algorithm population, with MI-EDDE and POToP showing the worst performances.Problem F11 has more than 10 real variables and a high sensitivity of the objective function to small variations in the decision variables.As a result, G-DEmi did not reach the solution before finishing the execution in 40% of the cases.Remarkable advantages of G-DEmi over PSOmv and EDAmv are shown in Tables S5 and S6 in Supplementary File S2.PSOmv and EDAmv obtained Mean FR values of 85.83% and 88.45%, and Mean SR values of 14.17% and 37.74%, respectively.G-DEmi outperformed PSOmv in 25 problems, with only three instances of similar performance.On the other hand, EDAmv was outperformed in 21 problems and only achieved similar performances in seven problems.G-DEmi did not exhibit inferior performance with respect to PSOmv and EDAmv in any case.
The experiments involving the gradient-based repair method with other DE variants are reported in Tables 5 and 6.These tables present the results of CJADE-CaR vs. G-CJADEmi and DE-CaR+S vs. G-DEmi+S.The comparison highlights that G-CJADEmi outperformed CJADE-CaR, achieving 100% of Mean FR and 90.48% of Mean SR compared to 98.81% and 69.05%, respectively.G-CJADEmi exhibited better results in 12 problems and similar results in 14 problems, but CJADE-CaR performed better in problems F15 and F23.In the second competition, G-DEmi+S surpassed DE-CaR+S with a Mean FR of 99.52% vs. 90.24%and a Mean SR of 91.55% vs. 83.93%.G-DEmi+S achieved better results in seven problems, with no significant difference in 19 problems, and was outperformed in F12 and F23 by DE-CaR+S.In general, the variants with gradient-based repair exhibited notable superiority, with G-DEmi as the most promising in terms of feasibility and success.
Additionally, an analysis was conducted to examine the convergence of G-DEmi during the evolutionary process.Three problems were selected for this: F3 with two variables, F12 with 15 variables and high sensitivity of OF to changes in decision variables, and F27 highly constrained.Figure 3 shows the convergence curves of the median solutions in terms of solution quality for problem F12.Each curve starts from the generation of at least one feasible solution.A star marker on each curve shows when the algorithm reached the      global solution.It is observed that G-DEmi converged rapidly and found the global optimum before 100,000 evaluations (blue point).In contrast, the other algorithms consumed 200,000 evaluations without achieving this result.In Supplemental File S2, Figs.
S1 and S2 show the curves for F3 and F27, indicating an even more favorable outcome for G-DEmi.
Likewise, the impact of the parameters k max and T min on the performance of G-DEmi was analyzed through the described problems.For each problem, 30 executions of 200,000 evaluations were carried out using different combinations of them, while the other parameters remained fixed at previously reported values.In Supplemental File S2, Tables S2, S3, and S4 show the average execution times [s] and the average OF values for each instance.The best values are highlighted in bold.In general, the performance was adversely affected by the option of no repair (k max ¼ 0), and better performances were seen as k max increased and T min decreased.However, it is important to note that a longer repair process resulted in higher time consumption.Therefore, the execution times could be a determining factor for reasonable values such as those used.Values of k max equal to 50 and 100, and T min of 1 Â 10 À60 and 1 Â 10 À80 proved effective in this study.The consistency of the performance suggests that both parameters have a robust behavior concerning the algorithm's performance.
In summary, the comprehensive analysis of the applied metrics reveals that the general performance of G-DEmi surpassed the other state-of-the-art algorithms.Notably, WRST confirmed the statistical significance of the observed differences in results.The variety of the solved problems indicates that the gradient-based method contributes to the performance of G-DEmi in a wide range of MINLP problems.Even in highly constrained problems with challenging feasible regions, the proposed approach has consistently demonstrated remarkable success rates.The successful integration of the gradient-based repair method across both standard and advanced versions of DE emphasizes the robustness and effectiveness of this approach within the context of this evolutionary algorithm.
It is important to note that, unlike the methods that promote diversity using additional variation operators such as composite trial generation, self-adaptive parameters, or dynamic parameters, G-DEmi uses the conventional variation operators of DE/rand/1/bin.The success of G-DEmi is due to the efficient search process within each subproblem provided by the gradient-based repair method.The proposed repair method compensates, to a large extent, the insensibility of DE to constraints in the context of MINLP problems.

REAL-WORLD CASE STUDY Description
An important number of real-world optimization problems are MINLP problems.For this reason, we selected a well-known real problem as a case study to evaluate the performance of G-DEmi.The subway optimization problem (SOP) was proposed in Bock & Longman (1982) for the New York subway system as a dynamic nonlinear mixed-integer control problem.The aim is to minimize the energy consumption of a train between two stations, taking into account the dynamic model and other constraints.The decision variables for this problem include the operating modes for the route (integer variables) as well as the switching times for each operating mode (real variables).A significant challenge in solving this problem is the presence of mixed variables, specifically in the form of time-dependent integer variables (Belotti et al., 2013).
In this study, we address the variant of SOP illustrated in Fig. 4 and described in Sager (2005), Sager, Bock & Reinelt (2009), Lee & Leyffer (2011).The subway starts with zero values of time, position, and velocity, and is required to come to a complete stop (zero velocity) at a position 2,112 ft ahead, within a maximum time of 65 s.A path constraint is added to a subset of the track, requiring that the subway's velocity never exceeds 24 mph (35.2 ft/s) from 1,200 ft to the end of the track.The initial conditions of the problem are represented by the values (0, 0, 0) for time, position, and velocity, while the final conditions correspond to (65, 2,112, 0).

Optimal control problem
The SOP is expressed as an optimal control problem in Eq. ( 19 The arrival time of the subway to the next station is denoted by the terminal time t f ¼ 65 s.The differential states x 0 ðÁÞ and x 1 ðÁÞ represent the position of the train in ft and velocity in ft/s, respectively.The subway can operate under four modes: wðÁÞ ¼ 1 in serial, wðÁÞ ¼ 2 in parallel, wðÁÞ ¼ 3 in coasting, and wðÁÞ ¼ 4 in braking.The Lagrange term for each of the four modes is defined in Eqs. ( 20)-( 22): Lðx; 2Þ ¼ Lðx; 3Þ ¼ 0; Lðx; 4Þ ¼ 0: The acceleration function f ðx; wÞ for each operating mode is represented by Eqs. ( 23)-( 26): In Eqs. ( 27)-( 29), the drag force per car (R) and the tractive force per car (T) are denoted in lb.The specific parameter values used in this model can be found in Supplemental File S3.

Decision variables
The present problem comprises eight operational states along the subway track.Each state is defined by a time interval in which one of the four operating modes is active.These states can be mathematically expressed by Eq. ( 30).

Numeric scheme
In this work, the single shooting method is used to convert the boundary value problem into an initial value problem.Therefore, the final conditions xðt f Þ are considered as two additional equality constraints.The dynamic model is then solved using Runge Kutta 45 with a relative error tolerance of 1 Â 10 À04 .The path constraint is activated at an unknown time, 1,200 ft away from the initial position, and an event detection technique is used to determine the exact time of constraint activation.The bisection method is employed with a tolerance of 1 Â 10 À06 to enclose the solution between the times before and after the event.Similarly, the same process is used to determine the exact time when the subway stops after the braking mode, and when the finishing line is crossed.

Results and analysis
The case study was solved using MI-EDDE, SADE-CaR, FROFI-CaR, CJADE-CaR, G-CJADEmi, DE-CaR+S, G-DEmi+S, and G-DEmi algorithms.BOToP was unsuitable for this problem because its procedure involves evaluating solutions without integrity conditions, which was impossible in this case.Each algorithm was executed 30 times with 100,000 evaluations per run.Table 7 shows the final results of each competing algorithm.As can be seen, MI-EDDE, SADE-CaR, CJADE-CaR, DE-CaR+S, and FROFI-CaR failed to find feasible solutions in any of the runs.In contrast, all DE variants with gradient-based repair consistently produced feasible solutions in all 30 runs, with standard deviation values consistently below 0.05.In this study, G-CJADEmi reported the best results, yielding 1.355 kWh as the best fitness solution, with a corresponding decision vector of: ;2:6747071;15:9101142;32:4524515;34:1250172;37:3175333;49:6483266;57:8272705;65;1;2;1;4;3;1;3;4: In Sager (2005), the best solution reported was 1.384 kWh.However, it is important to note that a fair comparison with Sager's result is challenging due to the unknown parameters of the numerical methods employed in that particular previous work.
Figure 5 shows the time-velocity curves of the best solutions obtained in this experiment, including the solution reported in Sager (2005).The current operation mode is represented by w, and the red points indicate operation mode changes.As can be seen, the behavior of the best solutions is based on three key factors that contribute to reducing energy consumption: (i) returning to cruising speed (w ¼ 1) after the initial acceleration (flattened zone of the curve), (ii) maintaining the path constraint active, and (iii) entering a coasting zone (w ¼ 3) before the final braking operation.The behavior exhibited by all of these solutions primarily consisted of these factors.exploring alternative approaches, such as direct methods for constraint repairing, is advisable.The integration of the gradient-based repair method into other evolutionary approaches could be a significant contribution, allowing for the assessment of its capabilities in a broader context.Additionally, future research will encompass the evaluation of G-DEmi in scalability, large-scale, and multi-objective mixed-integer problems.

Figure 1
Figure1Hypothetical example: The gradient area represents the feasible region defined by the constraints, and the red lines are the discontinuous feasible parts that also satisfy the integer constraints.Full-size  DOI: 10.7717/peerj-cs.2095/fig-1 , as well as two EAs based on different paradigms: PSOmv(Wang, Zhang & Zhou, 2021) based on particle swarm optimization, and EDAmv(Molina-Pérez et al., 2022) based on estimation of distribution.The versions of FROFI and SADE were combined with the cutting and repulsion strategy (CaR) proposed inLiu et al. (2021a), resulting in FROFI-CaR and SADE-CaR, respectively.Furthermore, we conducted experiments to evaluate the effectiveness of the gradient-based repair method in other state-of-the-art DE variants.Two recent approaches were considered for this purpose: chaotic local search-based differential evolution (CJADE)(Gao et al., 2019) and DE with CaR plus surviving solutions (DE-CaR+S)(Molina-Pérez et al., 2024).CJADE has demonstrated high efficacy in optimization problems with continuous variables, while DE-CaR+S has reported remarkable results in MINLP problems.We performed the following comparative analysis: CJADE-CaR (CJADE with CaR strategy) vs. G-CJADEmi (CJADE with gradient-based repair) and DE-CaR+S vs. G-DEmi+S (DE with gradient-based repair plus surviving solutions).

Figure 5
Figure 5 Time-velocity curves of the best solution from: (A) Sager (2005).(B) G-DEmi.(C) G-CJADEmi.(D) G-DEmi+CaR.The vertical dashed line indicates x 0 ¼ 1200 ft, and the horizontal dashed line corresponds to x 1 = 32.5 ft/s.The upper right quadrant of the axes depicts the path constraint.Full-size  DOI: 10.7717/peerj-cs.2095/fig-5 proposed an EDA with an adaptive-width histogram model and a learning-based histogram model to handle continuous and discrete variables, respectively.Sun & Gao (2019) proposed a probabilistic model to update the discrete variables in PSO.It uses social cognitive and self-cognitive coefficients to generate a probability distribution.Subsequently, Wang, Zhang & Zhou (2021) employed a learning-based histogram model for discrete variables in PSO.Li et al. (2021) proposed an EDA algorithm for mixed variables to find the best parameters of a convolutional neural network.Peng et al. (2021) implemented a histogram model in DE for handling discrete variables.

Repair Repair Target Trial Target Trial Better level Target Trial Target Trial Selection Selection Selection B) Worst level Best level Worst level Better level Better level Best level
The objective function f ðx g Þ Figure 2 Gradient-based repair method in MINLP context: (A) Standard DE variant with target-trial selection.(B) DE with gradient-based repair method.Full-size  DOI: 10.7717/peerj-cs.2095/fig-2Molina-Pérez et al. (2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.209511/35 Algorithm 2 G-DEmi framework.Input: NP; CR; F; k max ; T min ; Eval max ; ðx g Þ and Gðx g Þ for each individual in P g ; g;i Þ and Gðu g;i Þ 11: if u g;i is better than x g;i then 12: store u g;i into P gþ1 ; 13: else if f ðu g;i Þ < f ðx g;i Þ and y u g;i = 2Y then 14: repair u g;i according to Algorithm 1; 15: evaluate f ðu g;i Þ and Gðu g;i Þ 16: if u g;i is better than x g;i then 24: end while Molina-Pérez et al. (2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.largedimensionality (F8-F12 and F15, F16, and F18), or binary variables (F17, F22, and F23).Finally, F19 and F26-F28 are highly constrained problems.The details of these test functions can be found in Supplemental File S1, and a summary of their main features is presented in Table 1, where n is the number of variables of the problem; n 1 and n 2 are the

Table 1
Main features of the 28 benchmark MINLP problems.

Table 5
Performance comparison conducted over 30 independent runs with 200,000 OF evaluations for CJADE-CaR, G-CJADEmi, DE-CaR +S, and G-DEmi+S (Part 1 of 2).

Table 6
Performance comparison conducted over 30 independent runs with 200,000 OF evaluations for CJADE-CaR, G-CJADEmi, DE-CaR +S, and G-DEmi+S (Part 2 of 2).

Table 7
SOP results in 30 independent runs.